Skip to content

[FLINK-39094][table-planner] Avoid creating duplicate function instances in code generation#27613

Open
dylanhz wants to merge 2 commits intoapache:masterfrom
dylanhz:FLINK-39094
Open

[FLINK-39094][table-planner] Avoid creating duplicate function instances in code generation#27613
dylanhz wants to merge 2 commits intoapache:masterfrom
dylanhz:FLINK-39094

Conversation

@dylanhz
Copy link
Contributor

@dylanhz dylanhz commented Feb 14, 2026

What is the purpose of the change

Remove duplicate function instances in code generation.

Brief change log

Add a reusableFunctionTerms set in CodeGeneratorContext to cache function field terms, preventing redundant deep copies when the same function is added multiple times, similar to reusable converters and type serializers.

Verifying this change

Existing tests are enough.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 14, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

contextArgs: Seq[String] = null): String = {
val classQualifier = function.getClass.getName
val fieldTerm = CodeGenUtils.udfFieldName(function)
// check if function has been added before to avoid duplicate function instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the text of the PR says that existing tests are sufficient. It would be great to add a method level unit test to test out the logic around this specific change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added unit tests in CodeGeneratorContextTest that directly asserts references.size after calling addReusableFunction with the same/different functions, covering stateless dedup, stateful dedup, and stateful non-dedup cases.


// set of function instance term that will be added only once
private val reusableFunctionTerms: mutable.HashSet[String] = mutable.HashSet[String]()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we be sure in this?

Can we add a test with custom udf and counter inside checking that it was invoked once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can we be sure in this?

The dedup key (fieldTerm) is derived from functionIdentifier(), which is the same key used by addReusableObjectInternal to generate member/init statements — so the dedup is consistent with existing behavior.

Can we add a test with custom udf and counter inside checking that it was invoked once?

I added unit tests in CodeGeneratorContextTest that directly asserts references.size after calling addReusableFunction with the same/different functions, covering stateless dedup, stateful dedup, and stateful non-dedup cases.

As for a counter-based test, addReusableObjectInternal creates instances via InstantiationUtil.deserializeObject that bypasses constructors, and open() was already deduplicated by LinkedHashSet before this fix, so neither counter can distinguish the before/after behavior. Let me know if you have better idea.

@snuyanzin
Copy link
Contributor

What about different arguments?
What about deterministic/non-deterministic?

@featzhang
Copy link
Member

Good Job on function performance

@davidradl
Copy link
Contributor

@flinkbot run azure

@dylanhz
Copy link
Contributor Author

dylanhz commented Feb 25, 2026

What about different arguments? What about deterministic/non-deterministic?

This fix targets redundant deep copies in references, not the function creation or invocation logic. Both before and after this fix, only one function instance is actually in effect at runtime — the change simply avoids unnecessary duplicate copies being added to the references array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants